Correlated Wishart Matrices and Critical Horizons 
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We discuss a practical method to determine the eigenvalue spectrum of the empirical correlation 
matrix. The method is based on the analysis of the behavior of a conformal map at a critical horizon 
which is defined as a border line of the physical Riemann sheet of this map. The map is a convenient 
representation of the Marcenko-Pastur equation. 



The relation between the eigenvalue spectrum of the 
empirical covariance matrix and the spectrum of the un- 
derlying covariance matrix plays an important role in 
many research areas ranging from physics, telecommu- 
nication MJ, to information theory |2| and quantitative 
finance p| , and has attracted a great attention recently. 

The problem can be mathematically formulated as fol- 
lows. Suppose we have a statistical system with N de- 
grees of freedom x = (xj), i = 1, . . . ,N. We collect T 
independent samples of x, each being a certain realization 
X Q = (Xi) a of x, a — 1, . . . , T. The data are stored in a 
rectangular matrix X = (Xi a ) of dimension N x T. The 
covariance matrix^^l C: CV, = ( be estimated 

from the data as c = (l/T)XX r : c tj = (1/T) £ t X it X ju 
where c is empirical correlation matrix. Here X r stands 
for the transpose of X. For r = N/T — > (iV=const, 
T — > oo) the empirical covariance matrix c perfectly ap- 
proximates C. However, in practice, the number of sam- 
ples T is finite. One is therefore interested in the relation 
between c and C, in particular how well the eigenvalue 
distribution of c approximates the eigenvalue distribution 
of C for finite r. 

One can formulate the problem in terms of random 
matrix theory 0,0- One can think of c = (l/T)XX r 
as a matrix constructed of real^^ gaussian random ma- 
trix X. The only requirement which one has to impose 
on the probability measure for X is that the two-point 
correlations are: 



(XicXjfj) — 5 a pCi. 



(1) 



Such an ensemble of matrices X is called correlated 
Wishart ensemble 0, 0, El- The delta 5 a p in the last 
equation tells us that the samples are uncorrelated, while 
Cij that the degrees of freedom are correlated according 
to the covariance matrix C. 

The relation between the spectral density of c and C is 
given by the Marcenko-Pastur equation Q. This equa- 
tion has been intensively studied in the mathematical 
literature 0, || . The corresponding equations for corre- 
lated samples, that is for the case when the delta S a p is 
replaced by a sym metric matrix A a p in Q , have been de- 
rived recently using a diagrammatic technique P.ITl|. 

The purpose of the present paper is twofold. Firstly, 
we want to describe a practical method to calculate the 
eigenvalue spectrum of the estimator c. Some similar 
methods have been discussed in the literature [a 13 ■ How- 
ever, we believe that a conformal map representation 
[j| used here leads to a particularly simple and effec- 



tive practical method. Secondly, the eigenvalue smearing 
and noise dressing observed in the empirical correlation 
matrix c have a simple interpretation in terms of the 
conformal map, which we want to present. 

In order to determine the relation between the eigen- 
value densities of the covariance matrices c and C: p c (x) 
and pc{x) it is convenient to introduce Green's func- 
tions (resolvents): G(z) = (l/iV)Tr (z — C) _1 , where 
z is a complex variable, and correspondingly g(z) = 
(l/7V)(Tr [z — c) _1 ), where the average is over X from a 
Wishart ensemble with the correlations given by 0. It 
is also convenient to introduce generating functions for 
the spectral moments 0: 



Mi 



(2) 



fc=i 



fc=i 



where Met = Jpc(A)A fe <iA and analogously for ra ck . 
The generating functions are directly related to the re- 
solvents: M(z) — zG(z) — 1 and m(z) — zg(z) — 1. 

One can show [j| that if N, T — > oo with fixed r = 
N/T, the function m(z) can be calculated from M(Z) as 
follows: 



where: 



m{z) = M(Z), 



z = Z(l + rM{Z)). 



(3) 



(4) 



Assume that we know the correlation matrix C. We can 
calculate eigenvalues of C, and their multiplicities n& 
(degeneracies), and determine M{Z): 



K 



M(Z)=J2 



k=l 



Z-\t 



(5) 



where pk = nt/N is the fraction of all eigenvalues taking 
the value A&. For the later convenience we also assume 
that the eigenvalues are ordered Ai < • ■ ■ < \k- 

For the given M(Z) we can apply the equations © 
and to determine m(z) and g(z) = (1 + m(z))/z and 
further, from g(z) we can calculate the eigenvalue density 
p c (x) by taking the imaginary part of g(z) along the cuts 
on the real axis: p c {x) = — (l/7r)Im g(x + i0 + ). 

The equations 10 and Q are equivalent to the 
Marcenko-Pastur equation [3. Let us now discuss their 
physical content. 
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FIG. 1: Graphical representation of the map z — z(Z). Each pair of points: Z = l + Re^ and Z' = 1 + {r/R)e~ i * is mapped 
onto the same point z — z' . While <f> increases, Z and Z' move along the circles in the directions given by arrows. The critical 
horizon (dotted circle) has radius y/r. The image of a circle \Z — 1| = R is an ellipse in z-plane, which for R — > y/r degenerates 
to an interval (dotted line). 




FIG. 2: The inverse function Z = Z(z): the z-plane without the real interval [ac_,x+] is mapped onto the outside of the critical 
circle in the Z-plane. The interval is mapped into the upper (or lower) part of the critical semicircle. 



The equation @ tells us that the function m(z) as- 
sumes for r = the same form as the function M(Z) 
since in this case z = Z as follows from the Eq. J3J. 

In this case the spectrum of the experimental matrix c 
is identical as of the matrix C, as expected for T — > oo. 
However for r > 0, z — z(Z) is a nontrivial function 
and the relation of the generating function m{z) to the 
original one M(Z) is not simple. The function M(Z) has 
poles on the real axis at A^'s. We may ask whether one 
can see the traces of the poles also in the function m(z). 

In order to answer this question first consider the sim- 
plest example Ai = A2 = . . . = A^v = 1. The function 
M(Z) = 1/(Z — 1) has only one pole and the conformal 
map Q takes the form: 

Consider a point on the Z-plane in the distance R from 
the pole: Z = 1 + Re 1 ^. The function JJJJ assumes the 
value z — 1 + r + Re 1 ^ + (r/R)e~' 1 ^ the same as for a 
point Z' = l+(r / ' R)e~ 1 ^ '. As a consequence each point Z 
outside the circle \Z— 1| = y/r has a partner Z' inside this 
circle such that z(Z) = z(Z') as depicted in Fig. ^ On 
the limiting circle there are pairs of points: Z = 1+y/re 1 '*' 
and Z' = Z = 1 + y/re^ 1 ^ for which z assumes the same 
values. We see that the upper part of the semicircle is 
mapped onto an interval x± = (1 ± y/r) 2 on 

the real axis in the z-plane, and so is the lower semicircle. 



Inverting the function z = z(Z) one obtains a two- valued 
function Z — Z(z) and thus one has to decide which 
Riemann sheet to choose. The physical Riemann sheet 
corresponds to the outside of the circle \Z — 1| = y/r as 
follows from the expansion J5J). For the inverse function 
Z = Z(z) the image of the whole z-plane is given by 
the outside of the circle \Z — 1| = y/r plus the upper 
semicircle which is an image of the cut (see Fig. |5J). We 
thus see that the pole in the Z-plane is surrounded by a 
critical horizon behind which the argument of the inverse 
function Z = Z(z) never enters. Only when the horizon 
radius y/r shrinks to zero the variable Z can approach 
the "naked" pole. 

As mentioned before, both semicircles of the critical 
horizon \Z — 1| = y/r are mapped by the transformation 
Q onto the same interval [x-, x+] on the real axis which 
makes the inverse function Z = Z(z) two- valued. To 
make out of it a single- valued one one has to restrict the 
image to cither the upper or lower semicircle. This can 
be done by approaching the cut from above: z — x + i0 + 
to obtain the upper semicircle or from below z — x + i0~ 
for the lower semicircle (Fig. 0). The imaginary part 
of g(z) = (1 + M(Z(z))) jz gives, for z = x + i0 + , the 
eigenvalue density p c (x). 

We see that the shape of the eigenvalue density p c (x) 
is encoded in the behavior of the conformal map J3J near 
the critical horizon. This is true in the general case JSJ. 
The poles of M(Z) are distributed on the positive real 
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semi-axis at A^'s. For r > they are shielded behind a 
critical horizon (see Fig. [3]). In general there are K + 1 
Riemann sheets. We are interested in the one for which 
1/z -> when 1/Z -> in Eq. ©. Thus the physical 
region lies outside the critical horizon. It is symmetric 
about the real axis. The upper part (Im Z > 0) is mapped 
by Eq. Q onto intervals on the real axis in the z-plane 
and so is the lower one. When r is positive but very small 
each pole has its own horizon, but when r is increased the 
individual horizons grow and merge so that the number 
of connected components of the total horizon decreases 
with r (see Fig. |3J). For sufficiently large r a single critical 
horizon surrounds all poles. The corresponding image of 
the horizon on the z-plane has a single cut on the real 
axis. For the limiting case r = 1, the cut touches the 
origin at z = 0. 

The discussion presented above can be turned into a 
practical method of determining the shape of the eigen- 
value distribution p c (x). We have to find the inverse 
function Z = Z(z) to z — z{Z) |0J, insert the solution to 
M(Z) and determine the behavior of the resulting func- 
tion along the cuts on the real axis. The equation (0} can 
be analytically solved for Z only in few cases. In the gen- 
eral case one has to use a numerical method. Fortunately, 
one can bypass the problem of the explicit function's in- 
version by using a method described below. 

In the first step we determine the critical horizon. 
Then moving gradually along its upper part (Im Z > 0) 
we calculate z = z(Z) Q which is in this case real (z — 
x + i0 + ) and simultaneously p c (Z) = -(l/7r)Im (M(Z) + 
l)/z(Z) = — (l/7r)Im(m(z) + l)/z which gives the eigen- 
value density p c (%) = Pc(Z). Briefly speaking, the 
method is to use the auxiliary variable Z on the upper 
part of the horizon to parametrize both the eigenvalue 
x = z(Z) and p c (Z), and to treat the pairs (x(Z), p c (Z)) 
as p c {x). If the horizon has many disconnected parts, 
also the cut and the support of the eigenvalue density 
function p c (x) consists of many disconnected intervals. 

Let us describe how to determine the critical horizon. 
If Z = X + iY is a point on the horizon, the imaginary 
part of the map z = z(Z) Q vanishes and we have: 



K 

E 

fe=i 



PfcA 



1 



(x - A fc )2 + y2 



(7) 



This equation has to be solved for Y = Y(X) for given 
X. If the real solution exists it has two symmetric roots 
±Y. We are interested in the non-negative one Y > 
which corresponds to Z on the upper part of the critical 
horizon. Now we can calculate x = z(Z) and p c (Z): 

x{Z) = X + rf>A» + r f) ^ " , (8) 



fc=i 



and 



K 



p c {Z) = > 

7TT ' 



^(x-A fc ) 2 + r 2 ' 



(9) 



which, after ignoring Z being a parameter, gives the pair 
(x, p c {x)). The two equations above are explicit, so one 
can directly compute x and p c for any given Z if one 
knows the dependence Y(X). This dependence can be 
obtained by solving the Eq. (J7J) for Y(X). 

The solution of Eq. 10 depends on r. The function on 
the left-hand side of 10 has a shape of a 'hilly' landscape 
when plotted on Z-plane, with K peaks of infinite height 
located at the poles (A& , 0) . The circumference of each 
peak shrinks to zero for increasing altitude. The solution 
of the equation lies on curves obtained as a cross-section 
of the hilly landscape at height 1/r. One can think of a 
coast line around islands surrounded by a sea with the 
water level 1/r. For r — ► + , the level goes to infinity, 
and thus the cross-section contains only the points at 
which the peaks are located: (Afc,0). For r > 0, the 
level 1/r is finite, so the cross-section contains closed lines 
surrounding the poles and forming the critical horizon. 
When the water level drops the islands merge and the 
coast line grows (Fig. 

One can find the points where the horizon crosses the 
real axis by setting Y = in Eq. 0. Among those 
points, the leftmost A_ and the rightmost X + set the 
limits on the minimal and maximal value of X where one 
has to look for the horizon solution. The two points are 
mapped by Eq. (J3J onto x± being the lower and upper 
edge of the eigenvalue spectrum p c (x). The values X_ 
and X + can be easily found by a root finder algorithm 
applied to the Eq. JJJ with Y = 0. Having determined 
X_ and X + one can change X in small steps from X_ to 
X + and for each X find the positive solution Y of Eq. (JJJ 
to eventually determine Z = X + iY(X) on the horizon. 
In that way the problem is solved. 

Let us now discuss examples. First we consider a cor- 
relation matrix C having three eigenvalues {1,2,6} with 
equal weights pk — 1/3. The critical horizon for differ- 
ent r as well as corresponding eigenvalue spectra p c (x) 
are shown in Fig. [3] The evolution of the critical horizon 
with 1 jr can be treated as a level map for the landscape 
0. In the inset of Fig. |3]we compare the spectrum for 
r = 1/3 with a spectrum obtained by diagonalization of 
n = 3 • 10 5 matrices of size N = 48 and T = 144 cho- 
sen randomly from the corresponding Wishart ensemble. 
The agreement is perfect, up to a finite-size corrections 
near the edges of the cut. 

The second example is related to some practical prob- 
lem. The portfolio selection is a central problem of quan- 
titative finance. The importance of the random matrix 
theory for this problem has been recently discovered |3j. 
It has been realized that the lower part of the spectrum 
p c has a universal shape stemming from statistical fluc- 
tuations. The method which we described above gives a 
refined tool allowing one to observe a fine structure of the 
spectrum of the empirical matrix also in its lower part. 
As an example we consider a correlation matrix obtained 
for returns of 18 stocks on the Polish Stock Market. The 
eigenvalues read: 0.36, 0.38, 0.47, 0.48, 0.56, 0.59, 0.64, 
0.66, 0.68, 0.71, 0.78, 0.83, 0.89, 0.94, 0.95, 1.08, 1.16, 
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FIG. 3: Top: Critical horizon for {A,} = {1, 2, 6} and for r = 
1/100 (solid), r = 1/10 (dashed), r = 1/3 (dotted) and r = 
1/2 (dash-dot). Bottom: The corresponding spectra p c (x). 
Inset: the spectrum for r — 1/3, calculated (solid line) and 
found experimentally (dotted) for sample of 3 x 10 matrices 
of size N = 48. 



trices. The agreement is very good. This means that 
already for N of order 20 the large N limit, in which 
the analytical equations have been derived, applies. De- 
viations are observed only at the edges of the individual 
parts of the spectrum. 

Statistical properties of random matrices and complex 
analysis are very intertwined. We used here this interre- 
lation, or more precisely a conformal map representation 
of the Marcenko-Pastur equations, to derive a general, 
practical method for the determination of the eigenvalue 
distribution for the standard estimator of the covariance 
matrix. This method can also be applied to the corre- 
sponding equations [lOj for the case of a correlated sam- 
ple where the delta function 6 a p in Eq. is substituted 
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FIG. 4: Solid line: p c (x) for N — 18 eigenvalues of C taken 
from Polish Stock Market, for T = 54 <-> r = 0.333 and 
T — 255 <-> r = 0.0706, calculated using the method described 
in this paper. Dashed line: p c (x) obtained by diagonalization 
of 3 ■ 10 J Wishart matrices generated by a Monte-Carlo pro- 
cedure. 



5.84, they are normalized to ^ fe Xk = N. In Fig. 0|we 
show the expected eigenvalue spectrum of c for N = 18 
and for T = 54 (r = 0.333) and T = 255 (r = 0.0706). 
For small r a fine structure emerges in the spectrum. In 
figure the spectrum is also compared to the one obtained 
by diagonalization of n — 3 x 10 5 correlated Wishart ma- 



by a correlation matrix A a p ^ 8 a @- 

This work was partially supported by the Polish State 
Committee for Scientific Research (KBN) grant 2P03B- 
08225 (2003-2006) and Marie Curie Host Fellowship 
HPMD-CT-2001-00108 and by EU 1ST Center of Excel- 
lence "COPIRA". 



[1] R. Miiller, IEEE Transactions on Information Theory, 

48, 2495 (2002). 
[2] S.E. Skipetrov, Phys. Rev. E67, 036621 (2003). 
[3] J. P. Bouchaud, P. Cizeau, L. Laloux and M. Potters, 

Phys. Rev. Lett. 83, 1467 (1999). 
[4] P.P. Mitra and A.M. Sengupta, Phys. Rev. E60, 3389 

(1999). 

[5] Z. Burda, A. Gorlich, A. Jarosz, J. Jurkiewicz, Physica 

A343, 295 (2004). 
[6] J. Wishart, Biometrika A20, 32 (1928). 
[7] V.A. Marcenko and L.A. Pastur, Math. USSR-Sb. 1, 457 



(1967) 

[8] ZD. Bai and J.W. Silverstein, J. Multivariate Anal. 54, 
175 (1995). 

[9] S.I. Choi and J.W. Silverstein, J. Multivariate Anal. 54, 
295 (1995). 

[10] Z. Burda, J. Jurkiewicz and B. Waclaw, Phys. Rev. E71, 
026111 (2005). 

[11] J. Feinberg, A. Zee, Jour. Stat. Phys. 87, 473 (1997). 
[12] For simplicity we will assume that Vi : (xi) — 0. 
[13] One can also consider complex matrices. The large iV- 
limit is in this case identical as for real matrices. 



